home *** CD-ROM | disk | FTP | other *** search
-
- ************************************************************************
- * *
- * ASMGEN.COM - by J. Gersbach and J. Damke (Ver. 2.01) *
- * *
- * A program to generate cross-referenced assembly language code *
- * from any executable file. *
- * *
- * *
- * *
- * Uploaded to PCanada by Mark Magner November 23, 1983 *
- * *
- ************************************************************************
-
-
-
- * PREFACE *
-
-
- This program will generate 8086/87/88 assembly code text
- that is compatible with the IBM Personal Computer Macro
- Assembler from any executable diskette file up to 65,535
- bytes. The output can be routed to the console or a disk-
- ette file. A reference list may be generated separately or
- embedded at the appropiate instruction counter address in
- the assembly code.
-
- Some manual touch up will be required before reassembly, but
- nearly all the typing is done for you by ASMGEN and anything
- questionable is marked with "??".
-
- A file of sequential instructions may be resident on the
- same diskette to indicat to ASMGEN which addresses contain
- code, byted, words, or strings. This file may also include
- instructions to assume segment register values or toggle the
- output of assembley code text, generation of the reference
- table, 8087 mnemonics, of the inclusion of embedded reference
- information in the assembly file.
-
- DEBUG may be used to browse through the executable file to
- determine the starting locations of code and data to develop
- the sequential instruction file. It is important to accu-
- rately specify these locations for an accurate reference
- tabel and minimum touching up of the ASM output text.
-
- The number of references within the file determines the amount
- of memory required since a reference tabel is built in
- memory during the first pass. Disassembly is done from disk
- and only one file sector is in memory at any given time.
- Therefore memory size does not limit the size of the file
- to be disassembled. 48K bytes of memory will be enough for
- most programs but a few will need 64K or 128K. One diskette
- drive is sufficient but two is more convenient.
-
-
- * STARTING ASMGEN *
-
- There are two ways to work with ASMGEN: either by using the
- command menu or by calling ASMGEN with parameters.
- Following are the descriptions of both options.
-
- * USING THE ASMGEN MENU *
-
- The program is invoked by typing: ASMGEN
-
- You are then prompted for a file specification. Respond with
- the name of the executable file from which you wish to
- generate the assembly code. The executable file will normally
- have an extension of .EXE or .COM. ASMGEN will check this
- file spec for validity and then respond with a prompt that
- includes a summary of the command letters indicating that
- you may give it a command. The executable file contents
- are not checked for valid code and ASMGEN will try to dis-
- assemble text or compressed BASIC files and produce unintell-
- igible assembly code.
-
- The commands are:
-
- X filespec This file spec replaces any previous executable
- file spec. The usual file extension is .COM
- or .EXE
-
- EXAMPLE: X DATE.COM
-
-
- A <filespec> The executable file is disassembled and the assem-
- bly code is routed to the specified file. The
- usual file extension is .ASM. If the filespec is
- omitted, the output will default to the console.
-
- EXAMPLE: A DATE.ASM
-
- R <filespec> The reference table is sent to the file specified.
- The usual file extension is .TBL. If the filespec
- is omitted, the output will default to the console.
-
- EXAMPLE: R DATE.TBL
-
- Q The program is terminated and control returned to
- DOS.
-
-
- Each time a command has been executed, ASMGEN waits with a one line
- prompt for the next command.
-
- X <filespec>, A <CON>, R <CON> or Q ?
-
- The default filespec for each command is shown in brackets. Enter
- the next command of your choice as described above.
-
-
- * USING ASMGEN WITH PARAMETER CALLS *
-
- Up to three file specifications may be included when ASMGEN is
- first called from DOS. The executable file's name is given first,
- followed by specifications for the assembly and reference table
- files.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM, DATE.TBL
-
- If a semicolon follows the last filespec, ASMGEN will exit to DOS
- when the command has been executed. If no semicolon is entered,
- ASMGEN will display the menu options described above and wait for
- further input after executing the command.
-
- EXAMPLE: ASMGEN DATE.COM, DATE.ASM;
-
- If the filespec for the .ASM file and/or .TBL file is omitted,
- ASMGEN will generate first the .ASM file, then a .TBL file using
- the filename of the first filespec.
-
- EXAMPLE: ASMGEN DATE.COM,,; creates DATE.ASM and DATE.TBL and exits
- to DOS.
-
- If only the reference table is desired, the dummy name NUL should be
- entered in place of an .ASM filespec
-
- EXAMPLE: ASMGEN DATE.COM, NUL, DATE.TBL
-
- If only one filespec is given when the program is called, the reference
- table is built in memory and then the menu options are displayed for
- further commands.
-
- EXAMPLE: ASMGEN DATE.COM
-
-
- * PROGRAM EXECUTION *
-
- The disassembly is done in two passes through the scource file. On pass
- #1, the reference table is built in memory and the actual output is gen-
- erated during pass #2. Once the reference table is established, it remains
- in memory until an X or Q command is issued, and subsequent A and R com-
- mand executions skip pass #1. This saves a lot of time when the executable
- file is large.
-
- Three contiguous data areas are built dynamically in memory during pass #1.
- First is the compressed sequential instruction list. Second is a list of
- pointers for .EXE files that point to the locations of all relocatable
- variables in the program, also arranged in numerical order. These are
- established before reading any code. Third, the reference table is then
- built in a higher area of memory as pass #1 progresses.
-
- If all available memory in the program segment is filled before the first
- two data areas are completed, ASMGEN will abort to the command prompt.
- After the reference table is started, a shortage of memory will produce
- the message "Reference Table Incomplete Due to Insufficient Memory" and
- continue.
-
- Ctrl-Break may be used at any time to interrupt a command in progress.
-
-
- * READING THE ASSEMBLY CODE FILE (.ASM) *
-
- This file begins with a title taken from the executable file's name and
- date followed by the current date (in brackets).
-
- If not inhibited by the M switch in a SEQ file (explained later), the macro
- library will appear next in the file.
-
- Next will be a .RADIX 16 pseudo-op which tells the macro assembler that all
- numbers are in hexadecimal form.
-
- Then comes a header that indicates a starting value for the code segment,
- stack segment, instruction pointer and the stack pointer. The stack pointer
- is usually set to FFFF for .COM files but may be somewhat less depending on
- available memory. These values are passed by the linker for .EXE files.
-
- The first ASSUME statement might come next. There is one generated for each
- segment that begins with code. All segment registers are designated according
- to the current set of ASSUMEs. They will sometimes be incorrect, so all
- ASSUME statements should be checked prior to re-assembly.
-
- The disassembled output follows, terminated by an END statement and the
- execution address. An ORG psuedo-op is included if required.
-
- The text is compatible with the IBM Macro Assembler and the format is the same
- except for RETurns. To avoid the need for PROCedure titles, special mnemonics
- are provided for all RET instructions. These are defined in the macro library
- at the beginning of the file. Only macros that are needed for the current file
- are produced. The optional embedded commands that make up the reference table
- enhance the readability of the file. For very large files, this is sometimes
- undesirable and a separate reference table is best.
-
- When invalid instructions are encountered in code areas, they are reproduced
- as byte values followed by "??". If a near jump is defined previously in the
- code, and it is within range of a short jump, a NOP instruction is inserted
- after the jump. The executable file created with this .ASM file and the
- Macro Assembler and Linker will then be the same length as the original file.
- This makes it less important to differentiate between labels and numeric
- constants since the label values and their offsets within the file will be
- the same. The fundamental problem of disassembly is in knowing if the
- original assembly code defined a number as a label which changes as a function
- of it's position or as a number that always remains the same. If you make
- changes in the assembly code however, you must properly specify all values.
- You might as well remove all NOPs at the same time.
-
- Labels are five characters long and begin with "L". Segment labels begin with
- "S". The remaining characters are the current instruction counter in hex
- form, thus making each label unique and showing it's location in the original
- file. The instruction counter is continuous throughout the assembly code
- without resetting at segment boundaries. The segment labels are then in byte
- as opposed to paragraph form. In those cases where a label value is modified
- by an ASSUME statement, the original value is included as a comment in the
- referencing instruction so that it may be easily changed back if it was not
- intended as a location.
-
- The word "Relocatable" is printed at the end of any line that contains an
- ablolute paragraph value. These are values that DOS modifies after loading but
- befor executing a program. They are used for loading segment registers that
- are sensitive to the program location in menory. Relocatable values are not
- modified by ASSUMEs. ASMGEN converts these numbers from paragraph to byte
- values by multiplying them by sixteen so that they will fit within the 16-bit
- instruction counter field. When the paragraph value is negative or exceeds
- 0FFFH, it is left unchanged and a warning (??) is issued on that line. When
- a program larger than 64K bytes is being disassembled, it should be divided
- into smaller files.
-
- All words are produced as labels, except when the "L" switch has been enacted
- in the .SEQ file (explained later). The label name indicates it's numeric
- value and, if it does not occur on an instruction boundary, the name indicates
- it's position relative to the current instruction pointer is given by an EQU
- statement. Therefore the Macro Assember will assume that it is a location,
- but it is easily changed to a constant since the value is given in the label
- name. The word OFFSET precedes a label whenever it is questionable whether
- it is a label or an immediate value. You must decide which of the labels
- should be constants and which of the constants should be labels, and change
- them accordingly. When changing labels to numbers, be sure to append an
- "H" if the number ends with a "D" or a "B" since the Macro Assembler will
- otherwise assume that it is decimal or binary.
-
- Bytes are always treated as constants. An optional switch may be included in
- the .SEQ file (explained later) which enables numbers instead of labels if all
- references to the value are data segment and immediate operation types.
-
- An effective procedure to follow in attempting to understand the assembly code
- file is to look first for the message text area, the input commands, and the
- simpler subroutines. Then add label names to addresses in the .SEQ file
- (explained later) that remind the you of their purpose. Add comments to the
- labels. If these names are well chosen, the larger routines eventually will
- become clear. The embedded references are produced as labels so they will
- retain their meanings as they are changed.
-
- It is also helpful to spend some time studying the structure of data areas.
- Vector tables, which are frequently used to control the program's flow, reveal
- the program's structure very quickly. If some routines do not have labels at
- the beginning, it is usually because the code or tables that reference them
- (or the segment register assumptions) are not properly defined in the .SEQ
- file.
-
-
- * READING THE REFERENCE TABLE (.TBL) *
-
- A referencee is defined as a number that is referenced somewhere in the
- program. It may be a program loaction or a numeric constant.
-
- A referencor is is defined as the address in the program from which a refer-
- ence is made to the referencee.
-
- Each entry is composed of a referencEE followed by a list of referencors. If
- more than one line is needed, additional lines are indented to the first
- referencor position. The referencEE is followed by an "S" if it includes
- references to the beginning of segment. The referencor is followed by two
- letters, the first of which represents the segment register that is implied
- or prefixed in the referencing instruction. The second letter indicates the
- type of operation on the referencEE. When the reference entries are embedded
- in the assembly code, all values are preceded with the letter "L".
-
- ----------------------------------------------------------------------------
- 1st letter | 2nd letter
- SEG REGISTER | TYPE OF OPERATION
- ----------------------------------------------------------------------------
- C code | J jump M modify - INC, ADD, etc.
- S stack | C call I immediate - value or offset
- D data | R read T test or compare
- E extra | W write ? unknown or ESC instruction
- | P port
- ----------------|-----------------------------------------------------------
-
-
-
- * WRITING/READING THE SEQUENTIAL INSTRUCTION FILE (.SEQ) *
-
- The sequential instruction file is a list of special instructions to ASMGEN
- which the user creates. The file takes the form of a list of hexadecimal
- addresses and single-letter instructions or generation switches. If used,
- the .SEQ file must be on the same diskette as the source file and have the
- same name as the source file with an extension of .SEQ. Each instruction in
- the file must be in one of the following formats:
-
- addr command
- or
- addr command ;comment
- or
- addr command label comment
- or
- addr command label comment ;comment
-
- "addr" represents the instruction pointer value. All addr values must be in
- numerical sequence in the file.
-
- "command" may be either a toggle switch or a generation instruction.
-
- "label" is optional and replaces the label generated for this address with
- this non-blank string.
-
- "comment" is optional and must be preceded by "label" unless the dummy label
- "." is used. Everything following "label" is treated as an address comment
- and will be printed in the ASM file behind the generated instruction. The
- address comment may be up to 255 characters in length and should not contain
- a semi-colon.
-
- ";comment" is optional. Anything following a semi-colon in the .SEQ file
- instructions is considered as a comment in the .SEQ file only and is not added
- to the generated .ASM file.
-
- "label" and "comment" are not allowed when a generation switch is coded, but
- a ";comment" may be used to help clarify the .SEQ file.
-
- The .SEQ file is read into memory before the first pass starts. The addresses
- and commands will be compressed, but "label" and "comment" will be held in
- memory one to one. An effect of this is that memory space required for dis-
- assembly increases with each "label" and "comment" added to the .SEQ file.
-
-
- * DESCRIPTION OF GENERATION SWITCHES *
-
- THE VARIOUS TOGGLE SWITCHES ARE SET TO ON BY DEFAULT. Switches may be toggled
- on and off at any point in the .SEQ file/disassembly.
-
- All options switches except /M and /H can be either toggled or directly set by
- the user. A suffix of "+" turns the switch ON, and a suffix of "-" turns the
- switch OFF. Switches encountered in the file that have neither of these
- suffixes are toggled to the opposite of their state at the time; ON switches
- are turned OFF and OFF switches are turned ON.
-
- /B - generate byte references
-
- When ON, byte and word references are included in the reference table. When
- OFF, only word references are generated.
-
- /E - embedded references in ASM file
-
- When ON, reference table entries are inserted in the text just before the
- referencee's definition statement. When OFF, these entries are not included
- with the disassembled text. The entire reference table can be printed with
- the "R" command.
-
- /F - 8087 mnemonics
-
- When ON, ESC instructions are produced. When OFF, ESC instructions are assumed
- to be 8087 instructions and 8087 mnemonics are produced.
-
- /H - append hex "H"
-
- When this switch appears at any point in the .SEQ file, an "H" is appended to
- all hex numbers. This does not, of course, apply to the labels which are
- hex values preceded by the letter "L". The .RADIX 16 pseudo-op is omitted
- which allows the assembler's radix to default to decimal. This switch defaults
- to NO H APPEND. Note that it will be set only once. It retains it's value
- until the next .SEQ file is read.
-
- /L - generate label or number
-
- When ON, all word references are treated as labels. When OFF, a word reference
- is treated as a constant if all referencors are data immediate types.
-
- /M - suppress macro library
-
- When this switch appears at any point in the .SEQ file, no macro library is
- included in the text output. The DEFAULT IS THAT THE MACRO LIBRARY WILL BE
- INCLUDED. Note that this switch will be set only once. It retains it's
- value until the next .SEQ file is read.
-
- /O - control ASM output
-
- When ON, ASMGEN will output the generated text. When OFF, output will be
- suppressed.
-
- /R - control TBL output
-
- When ON, ASMGEN will output the generated reference data. When OFF, the
- reference table is not printed.
-
- /T - control trace output
-
- When ON, up to 16 bytes of object code are included as comments in each line
- of the assembly code file. When OFF, object code is not included.
-
-
- * DESCRIPTION OF .SEQ FILE COMMANDS *
-
- A - assume
-
- The following lines contain ASSUMptions for segment register values. They
- become effective at the address specified by this instruction and may be
- modified anywhere in the disassembly. The required format for assumptions is:
-
- & 0400 DS
-
- The ampersand indicates a continuation of the A instruction.
-
- In this example, a data segment beginning at a instruction pointer value of
- 400 will be assumed until another A instruction changes it. CS, ES, and
- SS are also supported. The segment assumptions are used for effective address
- calculations only. The code segment assumption does not affect the instruction
- pointer value.
-
- B - bytes
-
- The bytes encountered in the source file are assumed to have meaning as single
- byte values.
-
- C - code
-
- The bytes encountered in the source file are assumed to be valid 8088 machine
- language instructions.
-
- D - generate data operand
-
- The operand of the instructions is changed to immediate data. Subsequent bytes
- are interpreted as "C" (code follows).
-
- I - initial value for IP
-
- The hexadecimal value on this line overrides the instruction pointer value at
- the beginning of the file - not to be confused with the address at which
- execution begins. The default values are 0000 for EXE files and 0100H for COM
- and other files. The execution address following the END statement is omitted
- if this option is invoked.
-
- S - strings
-
- The bytes encountered in the source file are assumed to form text. Quoted text
- is produced for valid ASCII characters and byte values for others.
-
- # - defined length strings
-
- The first byte encountered in the source file contains the length of the
- character string which begins with the next encountered character. This length
- value may be overridden by a subsequent SEQ file instruction.
-
- $ - defined length strings
-
- The first byte encountered in the source file contains the length of the
- character string which begins with the next encountered character plus the
- length byte itself. This length value may be overridden by a subsequent SEQ
- file instruction.
-
- W - words
-
- Pairs of bytes encountered in the source file are assumed to have meaning
- as word values.
-
- X - repeating data structure
-
- A cyclic data structure is assumed to begin at the specified instruction
- pointer value. The structure definition may follow and is prefixed by
- an ampersand (&) to indicate the continuation of this instruction. If the
- definition does not follow, then the most recent definition is used. If no
- structure is yet defined, then an error message is displayed.
-
- The following elements may be used to define the structure:
-
- & NNNN S - The next NNNN bytes are defined as string characters
- & NNNN B - The next NNNN bytes are defined as byte values
- & NNNN W - The next NNNN bytes are defined as word values
- & XXNN $ - The next sequence of bytes is defined as NN fields. Each field
- consists of a length byte and a string of characters. The length
- of each field is contained in the first encountered byte. The
- high nibble (XX), if non-zero, is a bit mask of the length field
- within the byte. The length field is right-justified within the
- byte after the byte value is sent to the output file.
-
-
-
- * EXAMPLES OF .SEQ COMMANDS *
-
- This example .SEQ file shows all the possible instructions in the appropriate
- format.
-
- ;All switches are on at the beginning.
- 0 /T ;no object code as comments in output
- 0 /M ;no macro library in output
- 0 /H ;append "H" to all numbers
- 00H /A ;assume the following segment values
- ;Note that the ampersand (&) indicates the extended ASSUME
- & 380 DS ;the data segment starts at 380 hex
- & 380 ES ;the extra segment starts at 380 hex
- 0200 I ;initialize the instruction pointer to 200
- 0200 /F ;introduce 8087 mnemonics (not ESC)
- 0200 /E ;no embedded references
- 0200 C ;code begins at 200
- 0203H W ;words are at 203
- 0207 C ;more code starting here
- 220 X ;complex data structure begins here
- & 3 W ;words
- & 1 B ;byte
- & 0E02 $ ;2 strings starting with the 2nd byte follow
- ;bits 3,2,1 of the first byte contain the length of the
- ;string including the length byte.
- ;the high nibble (0E) is the mask.
- ;see also # in summary below
- & 1 B ;byte
- ;the structure repeats until 351
- 351 B ;bytes
- 358 C ;more code
- 380 S ;strings - list of messages
- 421 W ;words
- 4FD /B ;no further byte references
- 502 /R ;garbage here - turn off reference generation
- 502 /O ;and output
- 600H /O+ ;valid code - turn output back on
- 600 /R
- 600 C
- 1A60 /O- ;output file about to fill diskette - turn output off but keep
- ;scanning for references.
- ;another run will be needed to get the remaining code.
- 1B00 /D ;treat operand as immediate data
- 1DFD /B+ ;continue with byte references
- 1F45 W user_prt ;user provided labels will translate
- 2256 S $MSG ;to upper case
-
-
- Comments may be included if preceded by a semicolon.
-
- Alphabetic characters may be either upper or lower case.
-
- An "H" may follow the hex address.
-
-
-
- * SAMPLE SESSION *
-
- The external command CHKDSK.COM will serve as an example for this sample
- session because it is short. The .SEQ file is also short and easy to generate.
- Only these few instructions are needed.
-
-
- 0100 /T ;include object code as comments in .ASM file
- 0100 /E ;simpler output without references
- 04F7H S ;messages
- 04F7H /H ;append "H" to numeric values
-
- Using DEBUG, browse through CHKDSK.COM to see how this was arrived at.
- Usually, but not always, the best procedure is to assume code. If the code
- appears unintelligible, display it in hex/ASCII. If it is not text, assume
- bytes. Label positions in the first disassembly may indicate that some
- locations should be words. Next, generate the .ASM file by typing
-
- ASMGEN CHKDSK.COM <enter>
- A <enter>
-
- The assembly code can be viewed on the screen. Then type
-
- A CHKDSK.ASM <enter>
-
- to save the assembly source code to a file. Then,
-
- R CHKDSK.TBL <enter>
-
- to save the cross-reference table to disk.
-
- The Macro Assembler, Link.exe and Exe2bin could now be used to assemble
- CHKDSK.ASM, link it to .EXE and convert it to a .COM file. No modification
- should be necessary in this case.
-
- If working with code that is to be modified, the symbol types must be correctly
- specified as locations or as constants. If they are constants, place them
- outside of any segment. The label names may then be changed to make the code
- more readable.
-
-
- ENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDENDEND
- END OF TRANSFER - PRESS ENTER TO RETURN TO MENU